28 research outputs found
Multi-resolution two-sample comparison through the divide-merge Markov tree
We introduce a probabilistic framework for two-sample comparison based on a
nonparametric process taking the form of a Markov model that transitions
between a "divide" and a "merge" state on a multi-resolution partition tree of
the sample space. Multi-scale two-sample comparison is achieved through
inferring the underlying state of the process along the partition tree. The
Markov design allows the process to incorporate spatial clustering of
differential structures, which is commonly observed in two-sample problems but
ignored by existing methods. Inference is carried out under the Bayesian
paradigm through recursive propagation algorithms. We demonstrate the work of
our method through simulated data and a real flow cytometry data set, and show
that it substantially outperforms other state-of-the-art two-sample tests in
several settings.Comment: Corrected typos. Added Software sectio
Choosing a Proxy Metric from Past Experiments
In many randomized experiments, the treatment effect of the long-term metric
(i.e. the primary outcome of interest) is often difficult or infeasible to
measure. Such long-term metrics are often slow to react to changes and
sufficiently noisy they are challenging to faithfully estimate in short-horizon
experiments. A common alternative is to measure several short-term proxy
metrics in the hope they closely track the long-term metric -- so they can be
used to effectively guide decision-making in the near-term. We introduce a new
statistical framework to both define and construct an optimal proxy metric for
use in a homogeneous population of randomized experiments. Our procedure first
reduces the construction of an optimal proxy metric in a given experiment to a
portfolio optimization problem which depends on the true latent treatment
effects and noise level of experiment under consideration. We then denoise the
observed treatment effects of the long-term metric and a set of proxies in a
historical corpus of randomized experiments to extract estimates of the latent
treatment effects for use in the optimization problem. One key insight derived
from our approach is that the optimal proxy metric for a given experiment is
not apriori fixed; rather it should depend on the sample size (or effective
noise level) of the randomized experiment for which it is deployed. To
instantiate and evaluate our framework, we employ our methodology in a large
corpus of randomized experiments from an industrial recommendation system and
construct proxy metrics that perform favorably relative to several baselines
The Role of Uric Acid in Acute and Chronic Coronary Syndromes.
Uric acid (UA) is the final product of the catabolism of endogenous and exogenous purine nucleotides. While its association with articular gout and kidney disease has been known for a long time, new data have demonstrated that UA is also related to cardiovascular (CV) diseases. UA has been identified as a significant determinant of many different outcomes, such as all-cause and CV mortality, and also of CV events (mainly Acute Coronary Syndromes (ACS) and even strokes). Furthermore, UA has been related to the development of Heart Failure, and to a higher mortality in decompensated patients, as well as to the onset of atrial fibrillation. After a brief introduction on the general role of UA in CV disorders, this review will be focused on UA's relationship with CV outcomes, as well as on the specific features of patients with ACS and Chronic Coronary Syndrome. Finally, two issues which remain open will be discussed: the first is about the identification of a CV UA cut-off value, while the second concerns the possibility that the pharmacological reduction of UA is able to lower the incidence of CV events
Mortality and pulmonary complications in patients undergoing surgery with perioperative SARS-CoV-2 infection: an international cohort study
Background: The impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) on postoperative recovery needs to be understood to inform clinical decision making during and after the COVID-19 pandemic. This study reports 30-day mortality and pulmonary complication rates in patients with perioperative SARS-CoV-2 infection. Methods: This international, multicentre, cohort study at 235 hospitals in 24 countries included all patients undergoing surgery who had SARS-CoV-2 infection confirmed within 7 days before or 30 days after surgery. The primary outcome measure was 30-day postoperative mortality and was assessed in all enrolled patients. The main secondary outcome measure was pulmonary complications, defined as pneumonia, acute respiratory distress syndrome, or unexpected postoperative ventilation. Findings: This analysis includes 1128 patients who had surgery between Jan 1 and March 31, 2020, of whom 835 (74·0%) had emergency surgery and 280 (24·8%) had elective surgery. SARS-CoV-2 infection was confirmed preoperatively in 294 (26·1%) patients. 30-day mortality was 23·8% (268 of 1128). Pulmonary complications occurred in 577 (51·2%) of 1128 patients; 30-day mortality in these patients was 38·0% (219 of 577), accounting for 81·7% (219 of 268) of all deaths. In adjusted analyses, 30-day mortality was associated with male sex (odds ratio 1·75 [95% CI 1·28–2·40], p\textless0·0001), age 70 years or older versus younger than 70 years (2·30 [1·65–3·22], p\textless0·0001), American Society of Anesthesiologists grades 3–5 versus grades 1–2 (2·35 [1·57–3·53], p\textless0·0001), malignant versus benign or obstetric diagnosis (1·55 [1·01–2·39], p=0·046), emergency versus elective surgery (1·67 [1·06–2·63], p=0·026), and major versus minor surgery (1·52 [1·01–2·31], p=0·047). Interpretation: Postoperative pulmonary complications occur in half of patients with perioperative SARS-CoV-2 infection and are associated with high mortality. Thresholds for surgery during the COVID-19 pandemic should be higher than during normal practice, particularly in men aged 70 years and older. Consideration should be given for postponing non-urgent procedures and promoting non-operative treatment to delay or avoid the need for surgery. Funding: National Institute for Health Research (NIHR), Association of Coloproctology of Great Britain and Ireland, Bowel and Cancer Research, Bowel Disease Research Foundation, Association of Upper Gastrointestinal Surgeons, British Association of Surgical Oncology, British Gynaecological Cancer Society, European Society of Coloproctology, NIHR Academy, Sarcoma UK, Vascular Society for Great Britain and Ireland, and Yorkshire Cancer Research
Bayesian Methods for Two-Sample Comparison
<p>Two-sample comparison is a fundamental problem in statistics. Given two samples of data, the interest lies in understanding whether the two samples were generated by the same distribution or not. Traditional two-sample comparison methods are not suitable for modern data where the underlying distributions are multivariate and highly multi-modal, and the differences across the distributions are often locally concentrated. The focus of this thesis is to develop novel statistical methodology for two-sample comparison which is effective in such scenarios. Tools from the nonparametric Bayesian literature are used to flexibly describe the distributions. Additionally, the two-sample comparison problem is decomposed into a collection of local tests on individual parameters describing the distributions. This strategy not only yields high statistical power, but also allows one to identify the nature of the distributional difference. In many real-world applications, detecting the nature of the difference is as important as the existence of the difference itself. Generalizations to multi-sample comparison and more complex statistical problems, such as multi-way analysis of variance, are also discussed.</p>Dissertatio
Analysis of Distributional Variation Through Graphical Multi-Scale Beta-Binomial Models
<p>Many scientific studies involve comparing multiple datasets collected under different conditions to identify the difference in the underlying distributions. A common challenge in these multi-sample comparison problems is the presence of overdispersion, or extraneous causes other than the conditions of interest that also contribute to the cross-sample difference, which frequently results in false findings—identified “differences” not replicable in follow-up studies. When proper replicate samples are available under the conditions, one can in principle identify the interesting distributional variation from overdispersion through what we call the “analysis of distributional variation” (ANDOVA). We introduce a fully probabilistic framework for ANDOVA that achieves high computational efficiency. We take a divide-and-conquer multi-scale inference strategy: (i) first transform a general nonparametric ANDOVA task into a collection of ANDOVA tasks on Binomial experiments—each characterizing variations in the distributions at a particular location and scale, (ii) address each Binomial ANDOVA using a Beta-Binomial (BB) model, and (iii) use hierarchical graphical modeling to combine the inference from the BB models. We derive efficient MCMC-free Bayesian inference recipe under this framework through a combination of Laplace approximation-based numerical integration and message passing, and evaluate the performance of our method through extensive simulation. We apply the framework to analyzing DNase-seq data for identifying differences in transcriptional factor binding. Supplementary material for this article is available online.</p